Interview Focus Areas
The Infosys Data Engineer interview revolved around the following key areas:
➜ Project deep dive
➜ ETL pipeline design and orchestration
➜ PySpark DataFrame fundamentals
➜ SQL query writing and optimization
➜ Practical problem-solving
➜ Project and ETL Discussion Questions
The interview began with a detailed discussion around my current project.
Questions asked included:
➜ Tell me about your project.
➜ What is your source system, and how many types of source systems are involved?
➜ How many jobs are there in your project?
➜ How many ETL pipelines have you built or maintained?
➜ How do you schedule your pipelines, and which scheduler do you use?
➜ Have you faced any difficulties in your project? If yes, explain how you handled them.
This section focused heavily on ownership, scale, and real-world exposure to data pipelines.
SQL Questions
The interviewer tested SQL fundamentals and reasoning ability using scenario-based questions.
Questions asked included:
➜ If you have one query written using a CTE and another using a subquery, which one would you prefer and why?
➜ What are the differences between CTEs and subqueries?
SQL Coding Question
➜ Find the second highest salary of each department along with the employee name.
➜ Table structure: (id, name, dept, salary)
➜ This problem tested grouping, ranking logic, and the ability to use window functions or alternative approaches correctly.
PySpark Questions
The PySpark portion focused on DataFrame fundamentals and common transformations.
Questions asked included:
➜ Which do you prefer: DataFrame or RDD, and why?
➜ How do you define the structure (schema) of a DataFrame in PySpark?
➜ How do you read data from a Parquet file in PySpark?
➜ How do you add a new column to a DataFrame in PySpark?
➜ These questions evaluated practical PySpark usage rather than theoretical Spark internals.
Final Thoughts
The Infosys Data Engineer interview emphasized clarity around project experience, strong fundamentals in SQL and PySpark, and practical knowledge of ETL pipeline design and scheduling. Candidates preparing for similar roles should focus on understanding their projects end-to-end, be comfortable explaining design choices, and practice writing SQL queries involving grouping and ranking.
This interview reinforced that Infosys values hands-on experience and the ability to clearly articulate how data engineering solutions are built and maintained in real-world environments.